Skip to content

[Feature] Stop loading of page #3238

Closed
Closed
@RebliNk17

Description

@RebliNk17

In Chrome, there is an option to cancel loading of a page by clicking the X which is replaced by the refresh button when the page is loading.

There are some websites that keep on loading, even after 90s I keep on getting timeout errors.

If there was an option to stop loading the page (like there is in chrome), I would get the content that was already loaded and prevent from puppeteer to throw timeout.

I tried to used page.keyboard.press('Escape'); but with no luck..

Another solution would be to stop loading the page after X ms with something like that:

page.setPageLimitLoadingTime(30000);
which will stop the page from continuing the loading process and return all the data it already got...

Chromium API reference:
https://chromedevtools.github.io/devtools-protocol/tot/Page#method-stopLoading

Tell us about your environment:

Thank you.

** if there already is an option for my proposal I'm sorry, I just couldn't find anything...

Activity

aslushnikov

aslushnikov commented on Sep 13, 2018

@aslushnikov
Contributor

@RebliNk17 there's a window.stop method, would it be helpful to you?

await page.evaluate(() => window.stop());
RebliNk17

RebliNk17 commented on Sep 13, 2018

@RebliNk17
Author

Correct me if I'm wrong, but I think that the evaluate function is only running after page.goto finish...

anyway. it's not working.

something like this is partially working:
await page._client.send("Page.stopLoading");
but I cannot find a way to tell puppeteer that the page has finished loading...

RebliNk17

RebliNk17 commented on Sep 16, 2018

@RebliNk17
Author

Don't know how and why this code now works:
await page._client.send("Page.stopLoading");

It stops loading the page and returns all the data from goto...

A few days ago it didn't return any data and throw timeout but now it does...

RebliNk17

RebliNk17 commented on Sep 16, 2018

@RebliNk17
Author

Sorry, not working as I thought.
When using this flag: networkidle0 or networkidle2 in page.goto I am still getting timeout.

When using 'domcontentloaded' or 'load' I'm not getting all the data from some websites but than Im not getting timeout errors.

@aslushnikov Any thought on how to do it?

I've tried this:
https://github.com/RebliNk17/puppeteer/blob/master/lib/Page.js

But I'm still missing something...

aslushnikov

aslushnikov commented on Sep 16, 2018

@aslushnikov
Contributor

@RebliNk17 what do you expect to see when you "stop" loading?

If you just want the navigation promise to not hang, I'd implement stopping somehow like this:

let stopCallback = null;
const stopPromise = new Promise(x => stopCallback = x);

const navigationPromise = Promise.race([
  page.goto(url).catch(e => void e),
  stopPromise
]);

// Do something; once you want to "stop" navigation, call `stopCallback`.
stopCallback();
RebliNk17

RebliNk17 commented on Sep 16, 2018

@RebliNk17
Author

@aslushnikov
I'll try to explain better my problem.

What I want is to receive the website HTML content and HTTP requests from the page.goto promise after X seconds passed without Exception even if the page did not finish loading.

Currently, if the page did not finish loading a Timeout exception is thrown and no data (HTML / HTTP requests etc) is returned.

Expected result:
When stopLoading is called, the page will stop all process (Just like when pressing ESC or the X button on a regular browser) and will "display" all the content that has been loaded until that press.

Is it clearer now?
If not, I will create a short video to explains it (English is not my native language)

aslushnikov

aslushnikov commented on Sep 16, 2018

@aslushnikov
Contributor

Is it clearer now?

@RebliNk17 I'm still not sure what's not working.

Expected result:
When stopLoading is called, the page will stop all process (Just like when pressing ESC or the X button on a regular browser) and will "display" all the content that has been loaded until that press.

So the following approach should yield the expected result:

  • the await page._client.send("Page.stopLoading"); will stop page loading, as if you hit "X" in the browser
  • you can get page's content after that with await page.content()
  • you can catch and ignore Timeout exception from page.goto:
await page.goto(url).catch(e => void e), // catch and ignore exception

So what's not working?

vsemozhetbyt

vsemozhetbyt commented on Sep 16, 2018

@vsemozhetbyt
Contributor

Maybe page.goto(url, { waitUntil: 'domcontentloaded' }) will suffice?

RebliNk17

RebliNk17 commented on Sep 17, 2018

@RebliNk17
Author

Maybe page.goto(url, { waitUntil: 'domcontentloaded' }) will suffice?

That's not loading all the javascript in the page.

  • the await page._client.send("Page.stopLoading"); will stop page loading, as if you hit "X" in the browser

When using networkidle0 or networkidle2 that's not enough, it will still hang and throw timeout exception.

await page.goto(url).catch(e => void e), // catch and ignore exception

this will still hang until timeout.

what I found to be working is something like this:
I changed the code a little bit in lib\Page.js#goto

  async goto(url, options = {}) {
    ......

    const pageLoadingStoppedFunc = pageLoadingStopped.bind(this);

    let ensureNewDocumentNavigation = false;
    let error = await Promise.race([
      navigate(this._client, url, referrer),
      watcher.timeoutOrTerminationPromise(),
      pageLoadingStoppedFunc()
    ]);
    if (!error) {
      error = await Promise.race([
        watcher.timeoutOrTerminationPromise(),
        ensureNewDocumentNavigation ? watcher.newDocumentNavigationPromise() : watcher.sameDocumentNavigationPromise(),
        pageLoadingStoppedFunc(),
      ]);
    }
    watcher.dispose();
    helper.removeEventListeners(eventListeners);
    if (error)
      throw error;
    const request = requests.get(mainFrame._navigationURL);
    this._finished = true;
    return request ? request.response() : null;
    ...

    /* Not sure if this is the right approch for this function... */
    async function pageLoadingStopped() {
      const _this = this;
      return new Promise(function(resolve, reject) {
        const interval = setInterval(() => {
          if (_this._stopped || _this._finished) {
            clearInterval(interval);
            resolve();
          }
        }, 100);
      });
    }
  }

  async stopPageLoading() {
    await this._client.send('Page.stopLoading');
    this._stopped = true;
  }

this now waits for page loading to finish or loading to stop and not handing at all.

Is it possible to add it to the official puppeteer API?

RebliNk17

RebliNk17 commented on Oct 2, 2018

@RebliNk17
Author

@aslushnikov Any thought on the code I shared above?
It work as expected but not sure if there is a better approach for that...
If it's good, should I create PR?

aslushnikov

aslushnikov commented on Oct 4, 2018

@aslushnikov
Contributor

@RebliNk17 sorry for the delay, I was busy with other stuff.

Any thought on the code I shared above?

Can we step back and re-iterate since I still don't understand what's not working.

If I understand correctly, there's a website that takes a lot of time to load. We want to constrain wait time to certain amount and get content from the page after this time. Is this correct?

If yes, why's the following not working for you?

const puppeteer = require('puppeteer');
(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  try {
    // Contrain loading time to 30 seconds
    await page.goto('https://bestmodelsbrasil.blogspot.co.il', {waitUntil: 'networkidle0', timeout: 30000});
  } catch (e) {
  }
  console.log(await page.content());
  await browser.close();
})();
RebliNk17

RebliNk17 commented on Oct 11, 2018

@RebliNk17
Author

Sorry, I did not get any notification about your comment.

Your code will work, but sometimes, the timeout might not be a time, it can also depended on different code running in the background, like in my situation.

Adding this "stopPageLoading" which exists in the Chromium API, will make it possible...
It's something that Puppeteer should have...

RebliNk17

RebliNk17 commented on Oct 24, 2018

@RebliNk17
Author

@aslushnikov Any thoughts?
I see two people voted this...

21 remaining items

nylen

nylen commented on Jun 3, 2019

@nylen

My specific use case: I was building a web archiving tool that (ideally) should work with arbitrary pages, and I found there are certain kinds of navigation timeouts that can be avoided or shortened, like when a page is stuck Connecting... to a resource that's in the main rendering path. I think the issue in the OP is similar.

I agree there are other things that could cause navigations after a page is "stopped". I am assuming that "aborting all current in-flight requests" is good enough for my use case, and so far it seems to be working. For this part page._client.send('Page.stopLoading') is fine, but it was a bit hard to track down the correct call. At least now that is documented in this issue.

So I am mostly just looking for potential ways to improve the code of puppeteer users here. Hence the suggestion to make page.goto aware of "navigation aborted" events, because I think this would allow getting rid of the Promise.race in the examples above.

I don't think any of this is particularly urgent. Thanks for all of your work on Puppeteer.

superryeti

superryeti commented on Jul 30, 2019

@superryeti

@aslushnikov Thank you for this

@RebliNk17 sorry for the delay, I was busy with other stuff.

Any thought on the code I shared above?

Can we step back and re-iterate since I still don't understand what's not working.

If I understand correctly, there's a website that takes a lot of time to load. We want to constrain wait time to certain amount and get content from the page after this time. Is this correct?

If yes, why's the following not working for you?

const puppeteer = require('puppeteer');
(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  try {
    // Contrain loading time to 30 seconds
    await page.goto('https://bestmodelsbrasil.blogspot.co.il', {waitUntil: 'networkidle0', timeout: 30000});
  } catch (e) {
  }
  console.log(await page.content());
  await browser.close();
})();

I am using pyppeteer. and had the same problem(couldn't think of a way to get dom and cookies after a timeout). This solved my problem. I can access the DOM with.

await page.content()

and cookies by

await page.cookies()

I don't understand what everyone else is complaining about. Again, Thank you soo much. Saved me a couple of hours.

sheikalthaf

sheikalthaf commented on Dec 1, 2019

@sheikalthaf

@RebliNk17 what do you expect to see when you "stop" loading?

If you just want the navigation promise to not hang, I'd implement stopping somehow like this:

let stopCallback = null;
const stopPromise = new Promise(x => stopCallback = x);

const navigationPromise = Promise.race([
  page.goto(url).catch(e => void e),
  stopPromise
]);

// Do something; once you want to "stop" navigation, call `stopCallback`.
stopCallback();

I tried your solution and it is working good but when i try to take screenshot i'm getting error

error: Error: Protocol error (Page.captureScreenshot): Unable to capture screenshot
    at Promise (/node_modules/puppeteer/lib/Connection.js:183:56)
    at new Promise (<anonymous>)
    at CDPSession.send (/node_modules/puppeteer/lib/Connection.js:182:12)
    at Page._screenshotTask (/node_modules/puppeteer/lib/Page.js:951:39)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Page.<anonymous> (/node_modules/puppeteer/lib/helper.js:111:15)
    at htmlBrowser (/dist/apps/botminds-browser/main.js:1079:45)
    at process._tickCallback (internal/process/next_tick.js:68:7)
Mister-Fil

Mister-Fil commented on Jan 11, 2023

@Mister-Fil

Stop page loading and/or something else, this can also close the alert()

await page.keyboard.press('Escape')

If it doesn't work, then duplicate the line several times

await page.keyboard.press('Escape')
await page.keyboard.press('Escape')
await page.keyboard.press('Escape')
otachkin

otachkin commented on Mar 31, 2023

@otachkin

Can some one help me to stop this page of continuously loading ?

https://mbd.baidu.com/newspage/data/landingpage?s_type=news&dsp=wise&context=%7B%22nid%22%3A%22news_9644758218931914527%22%7D&pageType=1&n_type=1&p_from=-1&rec_src=52

await page.evaluate(() => window.stop());

Not working, puppeteer just stuck.

wesleyscholl

wesleyscholl commented on Apr 24, 2023

@wesleyscholl

This worked for me:

await page.goto(this.url, { waitUntil: 'domcontentloaded' })

Thanks!

heaven

heaven commented on Nov 18, 2023

@heaven

@aslushnikov The problem is when setting a timeout with page.goto, even when it fails with TimeoutError, the page keeps running in the browser. This slows down the entire process. Sometimes browser.pages() takes 10+ seconds. Working in a Lambda environment leads to unpredictable behavior and various errors.

Whenever time is out and we reach the timeout, it would be great or even awesome to have a way to stop the page immediately. I agree stopping the ongoing requests won't help much most likely but that'd be better than nothing.

Here's an example:

Function Logs
START RequestId: 44a07ce9-8800-4d98-bc5f-fdb236a34202 Version: $LATEST
2023-11-18T19:03:59.438Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Launching the browser
2023-11-18T19:04:01.978Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Creating new incognito context
2023-11-18T19:04:01.980Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Opening new page
2023-11-18T19:04:02.033Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Loading page
2023-11-18T19:04:05.452Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Main frame navigated to:  ...
2023-11-18T19:04:27.043Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Timeout error, skipping the page
2023-11-18T19:04:27.043Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Cleaning up
2023-11-18T19:04:27.050Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	CLEANUP: Loading pages (await browser.pages())
2023-11-18T19:04:47.091Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	CLEANUP: Closing pages (await Promise.all(pages.map(p => p.close().catch(() => {}))))
2023-11-18T19:04:47.111Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	CLEANUP: disabling disconnect event handler (browser.off('disconnected'))
2023-11-18T19:04:47.111Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	CLEANUP: Disconnecting from the browser (await browser.disconnect())
2023-11-18T19:04:47.112Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	CLEANUP: Done
2023-11-18T19:04:47.112Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Gzipping
2023-11-18T19:04:47.113Z	44a07ce9-8800-4d98-bc5f-fdb236a34202	INFO	Responding
END RequestId: 44a07ce9-8800-4d98-bc5f-fdb236a34202
REPORT RequestId: 44a07ce9-8800-4d98-bc5f-fdb236a34202	Duration: 47680.63 ms	Billed Duration: 47681 ms	Memory Size: 1536 MB	Max Memory Used: 662 MB	Init Duration: 972.51 ms

You can see await browser.pages() took 20 seconds. What's worst, with lambda the page can keep running in the browser even after the function is restarted. So the next event starts opening a new page and then that context.newPage() takes an enormous amount of time. The timeout is set to 25 seconds but the job took almost 47.

kduffie

kduffie commented on Dec 14, 2023

@kduffie

Our product crawls our customer's website as part of our overall solution. We are using Puppeteer for this and, overall, it works great. But we have the same problem discussed here. We can't know a priori what the appropriate timeout behavior needs to be on any given page or site.

When page.goto throws a TimeoutError, it doesn't necessarily mean that the page is unusable -- but after catching the error we can't access the HttpResponse that is returned when there is no exception. If a new method, page.response(), for example, returned the response object if it is available, we'd be happy. I realize that in some timeout scenarios the response will not be available (such as if the timeout is at the network layer). It may also be a good idea for Puppeteer to emulate a "stop" when it throws an error, but I don't see that I need to be part of that.

So something like the following would be desireable:

const browser = await launch();
const page = await browser.newPage();
try {
  await page.goto(url, {waitUntil: ['load', 'networkidle2'], timeout: 30000});
catch(err) {
  // perhaps check for errors other than timeout
}
const response = await page.response();
if (response) {
  // use various information from the page itself, but also from response, such as url, status, headers
}
// clean up
added a commit that references this issue on Apr 22, 2024
added a commit that references this issue on Apr 22, 2024
added a commit that references this issue on May 5, 2024
Mahmoud-Skafi

Mahmoud-Skafi commented on Jun 24, 2024

@Mahmoud-Skafi

for some reason this works for me:

await page
          .goto(url, { waitUntil: "domcontentloaded", timeout: 3000 })
          .catch((e) => void e);

await new Promise((resolve) => setTimeout(resolve, 3000));
await page.evaluate((_) => window.stop());

thanks for @chigix

sharee-tech

sharee-tech commented on Oct 1, 2024

@sharee-tech

I used .preventDefault() for this:
`const labelElement = await page.getByRole('link', { name: 'See The Lakes' });

await labelElement.evaluate((element) => {
  element.addEventListener('click', (event) => {
    event.preventDefault();  
  });
});

await labelElement.click();`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    chromiumIssues with Puppeteer-Chromium

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @heaven@nylen@kduffie@aslushnikov@Mister-Fil

      Issue actions

        [Feature] Stop loading of page · Issue #3238 · puppeteer/puppeteer